\documentclass[t,12pt,aspectratio=169]{beamer} % 16:9 宽屏比例，适合现代投影
\usepackage{ctex} % 中文支持
\usepackage{amsmath, amssymb} % 数学公式与符号
\usepackage{graphicx, color}
\usepackage{url}
\usepackage{verbatim}

% 主题设置（推荐简洁风格）
\usetheme{Madrid}
\usecolortheme{default} % 可选：seahorse, beaver, dolphin 等

\title{第五章：统计量及其分布}
\author{MSS ET AL}
\date{2018年3月}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}

\begin{frame}
  \titlepage
\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{本章知识点}

\begin{itemize}
    
\item  总体与样本，简单随机样本

\item  经验分布函数，格里纹科定理，频数频率表

\item  基本统计量：样本均值与样本方差

\item  矩，峰度和偏度，次序统计量

\item  分位数，五数概括，箱线图

\item  三个重要统计量：$\chi^2$, $T$, $F$ 统计量

\item  充分统计量，因子分解定理

\end{itemize}

\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{本章习题}

\begin{itemize}

\item  (5.1) 1, 2, 3, 4, 5, 6.

\item  (5.2) 1, 2, 3, 4, 5, 6, 7.

\item  (5.3) 1, 3, 8, 9, 13, 14, 22, 24, 29, 35.

\item  (5.4) 1, 2, 3, 4, 5, 6, 7, 10, 13, 16, 19.

\item  (5.5) 1, 4, 7, 10.

\end{itemize}


\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }

问：什么是统计学？什么是统计学习？


\begin{itemize}
\item {\color{red}统计学}：收集和分析受随机因素影响的数据。
\item {\color{red}统计学的内容}包括抽样调查、试验设计、回归分析、多元统计分析、时间序列分析、非参数统计、贝叶斯分析等。
\item {\color{red}统计学习}：理解已有数据，寻找预测函数。
\end{itemize}

\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：什么是总体？什么是个体？什么是样本？样本容量？


\begin{itemize}
\item 总体是所研究对象的全体，或{\color{red}所研究对象的全体所具有的分布}。这个分布一般是未知的。
\item 总体里的每一个成员称为个体。随机抽取的 $n$ 个个体称为{\color{red}一个样本}。$n$ 称为样本容量。
%\item 
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：啤酒厂生产的瓶装啤酒的规定的净含量为640克，由于随机性，事实上不可能使得所有的瓶装啤酒的净含量都为640克。现从某厂生产的啤酒中随机抽取10瓶测定其净含量，得到如下结果（单位：克）： 641、 635、 640、 637、 642、 638、 645、 643、 639、 640. 写出{\color{red}总体、样本、样本容量}。


答：总体：该厂生产的瓶装啤酒的净含量 $X$. 样本 $X_1,\cdots,X_n$. 样本容量 $n=10$.
观测值 $x_1,\cdots, x_n$.

%
\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }

问：举出一个分组样本的例子。分组样本的优点和缺点是什么？


答：


\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：什么是{\color{red}简单随机样本}？


\begin{itemize}
\item 随机性：总体的每个个体都等可能被选中。
\item 独立性：每一样品的取值与其它样品独立。
\item 简单随机样本 $X_1,\cdots,X_n$ 是一组独立同分布的随机变量，每个 $X_i$ 与总体 $X$ 有相同的分布。
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：设一批产品共 $N$ 个，不合格率为 $p$. 在总体所含个体数量 $N$ 很大时，从中不放回地随机取出 $n$ 个，当 $n<<N$ 时，可以把该样本近似看成简单随机样本。为什么？


答：当 $n<<N$ 时，
\begin{eqnarray*}
%P(X_2=x_2 \,|\, X_1=x_1) &\approx& P(X_2=x_2) \\
&& P(X_k=x_k \,|\, X_1=x_1,\cdots,X_{k-1}=x_{k-1})  \\ 
&\approx& P(X_k=x_k).
\end{eqnarray*}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：某食品厂生产听装饮料，现从生产线上随机抽取5听饮料，称得净重如下（单位：克）：351、347、355、344、351.
画出这个样本对应的{\color{red}经验分布函数}。 它和{\color{red}总体分布函数}有什么差别与联系？



\begin{itemize}
%\item 
\item 固定实数 $x$, 经验分布函数$F_n$ 的函数值 $F_n(x)$ 是一个随机变量。如何理解这一点？
\item 格里纹科定理：当样本容量增加时，经验分布函数依概率收敛于总体分布函数。
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：什么是经验分布函数？


\begin{itemize}
\item 设样本观测值是 $x_1, \cdots, x_n$.
\item 对固定实数 $x$, 经验分布函数 $F_n(x)=\frac{k}{n}$, 其中 $k$ 是这个样本里小于等于 $x$ 的个体的个数。
\item 不同的样本观测值会有不同的经验分布函数。
%\item 
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：为研究某厂工人生产某种产品的能力，随机调查了20位工人某天生产的产品数量，数据如下： 160、 196、 164、 148、 170、 175、 178、 166、 181、 162、 161、 168、 166、 162、 172、 156、 170、 157、 162、 154.  
对该样本进行分组、确定组距、确定分组区间、作出{\color{red}频数频率表}。


%答：


\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：如何用图形来显示样本数据？（R程序）


\begin{itemize}
\item 直方图。
\item 茎叶图。
\item 比较两组样本：直方图、背靠背茎叶图。
\item 其它方法。
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：什么是统计量？
%什么是抽样分布？
什么是统计量的观察值？


\begin{itemize}
\item {\color{red}统计量} $T=T(X_1,\cdots,X_n)$ 是样本的函数。
\item 因为每个 $X_i$ 是随机变量（与总体 $X$ 有相同的分布规律），统计量 $T$ 也是随机变量。
\item 代入{\color{red}样本的观测值} $x_1,\cdots, x_n$ 得{\color{red}统计量观测值} $t=t(x_1,\cdots,x_n)$.
\item 统计量的定义函数里不能带有未知参数。
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：某单位收集到20名青年人的某月的娱乐支出费用，写出该样本的{\color{red}样本均值}。若数据如下（单位：元）：
79、84、84、88、92、93、94、97、98、99、100、101、101、102、102、108、110、113、118、125. 计算{\color{red}样本均值的观察值}。


\begin{itemize}
\item 样本均值是一个统计量。理解这一点。
\item 数据观测值与样本均值的偏差的平方和最小。证明这一点。
\end{itemize}


\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：总体 $X$ 有样本 $X_1,\cdots,X_n$，设 $\bar{X}$ 是样本均值。
设总体为正态分布，即 $X\sim N(\mu,\sigma^2)$, 则样本均值服从什么分布？


\begin{itemize}
\item 样本均值 $\bar{X} := (X_1+\cdots+X_n)/n$.
\item 因为 $X_1,\cdots, X_n$ 相互独立，且都服从正态分布 $N(\mu,\sigma^2)$, 
所以 $\bar{X}\sim N(\mu,\sigma^2/n)$.
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }



总体 $X$ 有样本 $X_1,\cdots,X_n$，设 $\bar{X}$ 是样本均值。
设总体分布未知，则样本均值的{\color{red}渐近分布}是什么？



\begin{itemize}
\item 因为 $X_1,\cdots, X_n$ 相互独立，所以可以使用中心极限定理。
\item 由中心极限定理，当 $n$ 越来越大时，$\bar{X}$ 趋于正态分布 $N(\mu,\sigma^2/n)$.
\end{itemize}





\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }




问：写出两个基本统计量。


\begin{itemize}
\item {\color{red}样本均值}：$\bar{X}=\frac{1}{n}\sum_{i=1}^{n}X_i$.
\item 有偏的样本方差：$S_n^2=\frac{1}{n}\sum_{i=1}^{n}(X_i-\bar{X})^2$.
\item {\color{red}无偏的样本方差}：$S^2=\frac{1}{n-1}\sum_{i=1}^{n}(X_i-\bar{X})^2$.
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：设总体 $X$ 有二阶矩，即存在 $E(X)=\mu$ 和 $Var(X)=\sigma^2$. 设 $X_1,\cdots,X_n$ 是样本，设 $\bar{X}$ 和 $S^2$ 分别是样本均值和样本方差。求这两个基本统计量的数学期望和方差。（要会证明）



\begin{itemize}
\item $E(\bar{X})=\mu$.
\item $Var(\bar{X})=\sigma^2/n$.
\item $E(S^2)=\sigma^2$.
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：写出{\color{red}更多的统计量}。用R语言计算或绘图。


\begin{itemize}
\item 样本原点矩和样本中心矩。
\item 样本偏度和样本峰度。
\item 次序统计量。样本分位数、中位数。
\item 五数概括、箱线图。
\end{itemize}


\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：什么是{\color{red}次序统计量}?


\begin{itemize}
\item 设有样本 $X_1,\cdots, X_n$. 将其从小到大排列，记为 $X_{(1)},\cdots, X_{(n)}$, 称为次序统计量。
\item $X_{(1)}=\min\{X_1,\cdots, X_n\}$.
\item $X_{(n)}=\max\{X_1,\cdots, X_n\}$.
\item $X_{(k)}$ 是 $n$ 维随机变量 $(X_1,\cdots, X_n)$ 的函数。
\end{itemize}


\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }



问：设总体 $X$ 的密度函数为 $p(x)$, 分布函数为 $F(x)$. 设 $X_1,\cdots, X_n$ 是样本。求第 $k$ 个次序统计量的密度函数。


\begin{itemize}
\item 设第 $k$ 个次序统计量的分布函数是 $F_k(x)$.
\item $F_k(x+\Delta x)-F_k(x) = $ $(k-1)$ 个落在 $(-\infty,x]$, (1)个落在 $(x,x+\Delta x]$, 
$(n-k)$ 个落在 $(x+\Delta x,+\infty)$ 的概率。
\end{itemize}


\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }



问：设总体的密度函数是 $p(x)=3x^2$, $0<x<1$. 抽取一个容量为 5 的样本，求概率 $P(X_{(2)}<0.5)$.


\begin{itemize}
\item 求出随机变量 $X_{(2)}$ 的分布函数和密度函数。
\item 另解：概率 $P(X_{(2)}<0.5)$ 等于五个个体中至少有两个个体小于 0.5 的概率。
\end{itemize}




\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }




问：如何理解三大抽样分布？


\begin{itemize}
\item 假设总体是正态分布。{\color{red}三个著名统计量}的分布称为{\color{red}三大抽样分布}。
\item 三个著名统计量的定义：$\chi^2=?$, $T=?$, $F=?$.
\item 名称与自由度：$\chi^2(n)$, $t(n)$, $F(m,n)$.
\item 用R语言计算(分位数)和绘图(密度函数)。
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：设总体 $X$ 是正态分布 $N(0,\sigma^2)$, 设有简单随机样本 $(X_1,\cdots,X_n)$. 求统计量 
$W=X_1^2+\cdots+X_n^2$ 的分布。


\begin{itemize}
\item $X_k/\sigma$ 服从标准正态分布 $N(0,1)$.
\item $W/\sigma^2$ 服从自由度为 $n$ 的 $\chi^2$ 分布。
\item 由 $\chi^2$ 分布的密度函数得到 $W$ 的密度函数。
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：请问数理统计的``基本定理''？


答：设总体 $X$ 是正态分布 $N(\mu,\sigma^2)$, 设有简单随机样本 $(X_1,\cdots,X_n)$. 设 $\bar{X}$ 和 $S^2$ 分别是样本均值和样本方差。则有下述结论（要会证明）：

\begin{itemize}
\item 样本均值 $\bar{X}$ 与样本方差 $S^2$ 相互独立。
\item 样本均值 $\bar{X}$ 服从正态分布 $N(\mu,\sigma^2/n)$.
\item 样本方差 $S^2$ 调整一下服从$\chi^2$分布 $\chi^2(n-1)$.
\end{itemize}


\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：证明样本均值与样本方差相互独立。%证明样本均值服从正态分布 $N(\mu,\sigma^2/n)$.


\begin{itemize}
\item 将样本 $(X_1,\cdots,X_n)$ 进行正交变换，得到的 $n$ 个随机变量 $(Y_1,\cdots,Y_n)$ 仍然相互独立。
\item 样本均值与 $Y_1$ 有关，而样本方差与 $(Y_2,\cdots,Y_n)$ 有关。从而得证。
\end{itemize}




\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：什么是 $\chi^2$ 分布？


\begin{itemize}
\item $\chi^2:=\sum_{i=1}^nX_i^2$, 其中 iid 的 $X_i\sim N(0,1)$.
\item 求 $\chi^2$ 统计量的密度函数与分位数 $\chi^2_\alpha(n)$.
\item 求 $\chi^2$ 统计量的期望与方差。
\item $\chi^2\sim \chi^2(n)$, 这是分布的名称。
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：什么是 $F$ 分布？


\begin{itemize}
\item $F:=\frac{X_1/m}{X_2/n}$. 其中 $X_1\sim\chi^2(m)$, $X_2\sim\chi^2(n)$, 且 $X_1$ 与 $X_2$ 相互独立。
\item 求 $F$ 统计量的密度函数与分位数 $F_\alpha(m,n)$.
\item $F\sim F(m,n)$, 这是分布的名称。
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：什么是 $t$ 分布？


\begin{itemize}
\item $T:=\frac{X_1}{\sqrt{X_2/n}}$, 其中 $X_1\sim N(0,1)$, $X_2\sim\chi^2(n)$, 且 $X_1$ 与 $X_2$ 相互独立。
\item 求 $t$ 统计量的密度函数与分位数 $t_\alpha(n)$.
\item $T\sim t(n)$, 这是分布的名称。
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：$t$ 检验的基本原理是什么？


\begin{itemize}
\item 参考推论5.4.2结论：$T=\frac{\bar{X}-\mu}{S/\sqrt{n}} \sim t(n-1)$.
\item 若方差未知则用统计量 $T$, 这是大多数情形。
\item 若方差已知则用统计量 $U=\frac{\bar{X}-\mu}{\sigma/\sqrt{n}} \sim N(0,1)$.
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：设 $X_1,\cdots,X_n$ 为来自正态总体 $N(\mu,\sigma^2)$  的简单随机样本，如何构造统计量来估计参数 $\sigma^2$ ?
下述两个统计量哪个好一些？
\begin{itemize}
\item $S=\sqrt{\frac{1}{n-1}\sum_{i=1}^{n} (X_i-\bar{X})^2 }$;
\item $d=\sqrt{\frac{\pi}{2}}\frac{1}{n} \sum_{i=1}^n |X_i-\bar{X}|$.
\end{itemize}


答：在 $S$ 中包含了样本中有关 $\sigma$ 的全部信息，是{\color{red}充分统计量}，而 $d$ 不是。



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：为研究某运动员的打靶命中率 $\theta$,  我们观察了10次，发现第三次和第六次没命中，其余都命中。
记 $T=X_1+\cdots+X_n$, 这里 $T_i=1$ 表示第 $i$ 次命中， $X_i=0$ 表示不命中。则 $T$ 是参数 $\theta$ 的充分统计量。


证明：对任意整数 $0\le t\le n$,  对任意的数  $x_1$, $\cdots$, $x_n$ $\in\{0,1\}$, 其中 $t=\sum_{i=1}^n x_i$, 计算可知下述条件概率与参数 $\theta$ 无关，由此我们称 $T$ 是{\color{red}充分统计量}。
\[P\left[ X_1=x_1,\cdots,X_n=x_n \,\big{|}\, T=t \right] =1/\binom{n}{t}.\]


\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：设 $X_1,\cdots,X_n$ 为来自正态总体 $N(\mu,1)$  的简单随机样本，证明样本均值 $\bar{X}=:T$ 是参数 $\mu$ 的充分统计量。


答：计算下述条件密度函数与参数 $\mu$ 无关：
\[
p_{X_1,\cdots,X_n | T}(x_1,\cdots,x_n|t) = \frac{p_{X_1,\cdots,X_n,T}(x_1,\cdots,x_n,t)}{p_T(t)}.
\]




\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：充分统计量的定义？充分性原则？


\begin{itemize}
\item 充分统计量的定义：在充分统计量的观测值确定的条件下，样本的条件分布变得与未知参数无关了。
（总体的分布里有未知参数）
\item 在充分统计量存在的场合，任何统计推断都可以基于充分统计量进行，这可以简化统计推断的程序。
\end{itemize}



\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：如何判断一个统计量是不是充分的？设总体的概率函数是 $f(x;\theta)$ （离散或连续）, 设 $X_1,\cdots,X_n$ 是一个样本，求 $T=T(X_1,\cdots,X_n)$ 是充分统计量的充分必要条件。


答： {\color{red}Neyman的充要条件}：存在两个函数 $g(t;\theta)$ 和 $h(x_1,\cdots,x_n)$ 使得对任意的样本观测值 $x_1,\cdots,x_n$ 和统计量的相应观测值 $t$, 都有下述因子分解：
\[f(x_1,\cdots,x_n;\theta) = g(t;\theta)h(x_1,\cdots,x_n).\]


\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{frame}{5. }


问：设总体分布是 $X\sim U(0,\theta)$, 设有简单随机样本 $X_1,\cdots,X_n$.  证明极大值 $T=\max\{X_1,\cdots,X_n\}$ 是参数 $\theta$ 的充分统计量。


\begin{itemize}
\item 写出联合密度函数 $f(x_1,\cdots,x_n;\theta)$.
\item 想办法将其分解成两个因子 $h(x_1,\cdots,x_n)$ 和 $g(t;\theta)$ 的乘积。这里 $t=\max\{x_1,\cdots,x_n\}$.
%\item 
\end{itemize}


\end{frame}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\end{document}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%



